Goto

Collaborating Authors

 reconstruction task





On the Temporality for Sketch Representation Learning

Junior, Marcelo Isaias de Moraes, Ponti, Moacir Antonelli

arXiv.org Artificial Intelligence

Sketches are simple human hand-drawn abstractions of complex scenes and real-world objects. Although the field of sketch representation learning has advanced significantly, there is still a gap in understanding the true relevance of the temporal aspect to the quality of these representations. This work investigates whether it is indeed justifiable to treat sketches as sequences, as well as which internal orders play a more relevant role. The results indicate that, although the use of traditional positional encodings is valid for modeling sketches as sequences, absolute coordinates consistently outperform relative ones. Furthermore, non-autoregressive decoders outperform their autoregressive counterparts. Finally, the importance of temporality was shown to depend on both the order considered and the task evaluated.


REFRAG: Rethinking RAG based Decoding

Lin, Xiaoqiang, Ghosh, Aritra, Low, Bryan Kian Hsiang, Shrivastava, Anshumali, Mohan, Vijai

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable capabilities in leveraging extensive external knowledge to enhance responses in multi-turn and agentic applications, such as retrieval-augmented generation (RAG). However, processing long-context inputs introduces significant system latency and demands substantial memory for the key-value cache, resulting in reduced throughput and a fundamental trade-off between knowledge enrichment and system efficiency. While minimizing latency for long-context inputs is a primary objective for LLMs, we contend that RAG require specialized consideration. In RAG, much of the LLM context consists of concatenated passages from retrieval, with only a small subset directly relevant to the query. These passages often exhibit low semantic similarity due to diversity or deduplication during re-ranking, leading to block-diagonal attention patterns that differ from those in standard LLM generation tasks. Based on this observation, we argue that most computations over the RAG context during decoding are unnecessary and can be eliminated with minimal impact on performance. To this end, we propose REFRAG, an efficient decoding framework that compresses, senses, and expands to improve latency in RAG applications. By exploiting the sparsity structure, we demonstrate a 30.85 the time-to-first-token acceleration (3.75 improvement to previous work) without loss in perplexity. In addition, our optimization framework for large context enables REFRAG to extend the context size of LLMs by 16. We provide rigorous validation of REFRAG across diverse long-context tasks, including RAG, multi-turn conversations, and long document summarization, spanning a wide range of datasets. Experimental results confirm that REFRAG delivers substantial speedup with no loss in accuracy compared to LLaMA models and other state-of-the-art baselines across various context sizes.


Training LLMs to be Better Text Embedders through Bidirectional Reconstruction

Su, Chang, Shi, Dengliang, Huang, Siyuan, Du, Jintao, Meng, Changhua, Cheng, Yu, Wang, Weiqiang, Lin, Zhouhan

arXiv.org Artificial Intelligence

Large language models (LLMs) have increasingly been explored as powerful text embedders. Existing LLM-based text embedding approaches often leverage the embedding of the final token, typically a reserved special token such as [EOS]. However, these tokens have not been intentionally trained to capture the semantics of the whole context, limiting their capacity as text embeddings, especially for retrieval and re-ranking tasks. We propose to add a new training stage before contrastive learning to enrich the semantics of the final token embedding. This stage employs bidirectional generative reconstruction tasks, namely EBQ2D (Embedding-Based Query-to-Document) and EBD2Q (Embedding-Based Document-to-Query), which interleave to anchor the [EOS] embedding and reconstruct either side of Query-Document pairs. Experimental results demonstrate that our additional training stage significantly improves LLM performance on the Massive Text Embedding Benchmark (MTEB), achieving new state-of-the-art results across different LLM base models and scales.





Robust Sparse Bayesian Learning Based on Minimum Error Entropy for Noisy High-Dimensional Brain Activity Decoding

Li, Yuanhao, Chen, Badong, Bai, Wenjun, Koike, Yasuharu, Yamashita, Okito

arXiv.org Artificial Intelligence

Objective: Sparse Bayesian learning provides an effective scheme to solve the high-dimensional problem in brain signal decoding. However, traditional assumptions regarding data distributions such as Gaussian and binomial are potentially inadequate to characterize the noisy signals of brain activity. Hence, this study aims to propose a robust sparse Bayesian learning framework to address noisy highdimensional brain activity decoding. Methods: Motivated by the commendable robustness of the minimum error entropy (MEE) criterion for handling complex data distributions, we proposed an MEE-based likelihood function to facilitate the accurate inference of sparse Bayesian learning in analyzing noisy brain datasets. Results: Our proposed approach was evaluated using two high-dimensional brain decoding tasks in regression and classification contexts, respectively. The experimental results showed that, our approach can realize superior decoding metrics and physiological patterns than the conventional and state-of-the-art methods. Conclusion: Utilizing the proposed MEE-based likelihood model, sparse Bayesian learning is empowered to simultaneously address the challenges of noise and high dimensionality in the brain decoding task. Significance: This work provides a powerful tool to realize robust brain decoding, advancing biomedical engineering applications such as brain-computer interface.